Subspace Gaussian Mixture Models for Large Vocabulary Speech Recognition

نویسنده

Liang Lu

چکیده

Subspace Gaussian mixture model(GMM) is an alternative approach to approximate the probabilistic density function (p.d.f) of a set of independent identical distributed (i.i.d) data with prior density estimates. In this approach, the prior density of GMM parameters is estimated from a development dataset, and when predict the new enrolled data, the prior knowledge can be utilised by criteria like Maximum a Posterior. Unlike the conventional prior estimate method for GMM, the correlations between parameters of different Gaussian components are considered in this approach. In order to handle the large size of parameter set and meanwhile to ensure the priors be informative, the prior density estimation is constraint to a low dimensional subspace of the whole model space which can capture the main model variations. The subspace GMM has already been successfully applied in the task of speaker recognition, and achieved promising performance, but there is no much work of applying this approach to speech recognition. In this paper, we will present a new framework of HMM based speech recognition system based subspace GMM, in which, the parameters of state-dependent GMM are not estimated separately but been generated from the globally shared low dimensional model subspace. The approach can considerably reduce the model size and in addition, make the speech recognition system more scalable and adaptable. In this paper, we will first review the principles of subspace GMM approach based on its applications in speaker recognition and then discuss how to extend it to the task of speech recognition.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Microsoft Word - Hybridmodel2.dot

Today’s state-of-the-art speech recognition systems typically use continuous density hidden Markov models with mixture of Gaussian distributions. Such speech recognition systems have problems; they require too much memory to run, and are too slow for large vocabulary applications. Two approaches are proposed for the design of compact acoustic models, namely, subspace distribution clustering hid...

متن کامل

Tper Hcaeser Pidi Application of Subspace Gaussian Mixture Models in Contrastive Acoustic Scenarios

This paper describes experimental results of applying Subspace Gaussian Mixture Models (SGMMs) in two completely diverse acoustic scenarios: (a) for Large Vocabulary Continuous Speech Recognition (LVCSR) task over (well-resourced) English meeting data and, (b) for acoustic modeling of underresourced Afrikaans telephone data. In both cases, the performance of SGMM models is compared with a conve...

متن کامل

Combating reverberation in large vocabulary continuous speech recognition

Reverberation leads to high word error rates (WERs) for automatic speech recognition (ASR) systems. This work presents robust acoustic features motivated by subspace modeling and human speech perception for use in large vocabulary continuous speech recognition (LVCSR). We explore different acoustic modeling strategies and language modeling techniques, and demonstrate that robust features with a...

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Large vocabulary conversational speech recognition with a subspace constraint on inverse covariance matrices

This paper applies the recently proposed SPAM models for acoustic modeling in a Speaker Adaptive Training (SAT) context on large vocabulary conversational speech databases, including the Switchboard database. SPAM models are Gaussian mixture models in which a subspace constraint is placed on the precision and mean matrices (although this paper focuses on the case of unconstrained means). They i...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Subspace Gaussian Mixture Models for Large Vocabulary Speech Recognition

نویسنده

چکیده

منابع مشابه

Microsoft Word - Hybridmodel2.dot

Tper Hcaeser Pidi Application of Subspace Gaussian Mixture Models in Contrastive Acoustic Scenarios

Combating reverberation in large vocabulary continuous speech recognition

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Large vocabulary conversational speech recognition with a subspace constraint on inverse covariance matrices

عنوان ژورنال:

اشتراک گذاری